Structured Support Vector Machines for Speech Recognition
نویسنده
چکیده
Discriminative training criteria and discriminative models are two eective improvements for HMM-based speech recognition. is thesis proposed a structured support vector machine (SSVM) framework suitable for medium to large vocabulary continuous speech recognition. An important aspect of structured SVMs is the form of features. Several previously proposed features in the eld are summarized in this framework. Since some of these features can be extracted based on generative models, this provides an elegant way of combine generative and discriminative models. To apply the structured SVMs to continuous speech recognition, a number of issues need to be addressed. First, features require a segmentation to be specied. To incorporate the optimal segmentation into the training process, the training algorithm is modied making use of the concave-convex optimisation procedure. A Viterbi-style algorithm is described for inferring the optimal segmentation based on discriminative parameters. Second, structured SVMs can be viewed as large margin log linear models using a zero mean Gaussian prior of the discriminative parameter. However this form of prior is not appropriate for all features. An extended training algorithm is proposed that allows general Gaussian priors to be incorporated into the large margin criterion. ird, to speed up the training process, strategies of parameter tying, 1-slack optimisation, caching competing hypotheses, lattice constrained search and parallelization, are also described. Finally, to avoid explicitly computing in the high dimensional feature space and to achieve the nonlinear decision boundaries, kernel based training and decoding algorithms are also proposed. e performance of structured SVMs is evaluated on small and medium to large speech recognition tasks: AURORA 2 and 4.
منابع مشابه
Face Recognition using Eigenfaces , PCA and Supprot Vector Machines
This paper is based on a combination of the principal component analysis (PCA), eigenface and support vector machines. Using N-fold method and with respect to the value of N, any person’s face images are divided into two sections. As a result, vectors of training features and test features are obtain ed. Classification precision and accuracy was examined with three different types of kernel and...
متن کاملStructured Support Vector Machines for Noise Robust Continuous Speech Recognition
The use of discriminative models is an interesting alternative to generative models for speech recognition. This paper examines one form of these models, structured support vector machines (SVMs), for noise robust speech recognition. One important aspect of structured SVMs is the form of the joint feature space. In this work features based on generative models are used, which allows model-based...
متن کاملA Comparative Study of Gender and Age Classification in Speech Signals
Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...
متن کاملVisual Speech Recognition Using Support Vector Machines
In this paper we propose a visual speech recognition network based on Support Vector Machines. Each word of the dictionary is described as a temporal sequence of visemes. Each viseme is described by a support vector machine, and the temporal character of speech is modeled by integrating the support vector machines as nodes into a Viterbi decoding lattice. Experiments conducted on a small visual...
متن کاملApplication of support vector machines classifiers to visual speech recognition
In this paper we proposed a visual speech recognition network based on Support Vector Machines. Each word of the dictionary is modeled by a set of temporal sequences of visemes. Each viseme is described by a support vector machine, and the temporal character of speech is modeled by integrating the support vector machines as nodes into Viterbi decoding lattices. Experiments conducted on a small ...
متن کامل